Background: Sickle cell disease (SCD) is the most common inherited blood disorder in the United States. The ASH Research Collaborative (ASH RC) is building an SCD Data Hub (DH) to facilitate research and quality improvement using real-world data. The DH ingests multi-site electronic health record (EHR) data, which are rich in clinical context and longitudinal assessments. SCD diagnosis can be ascertained in the EHR from any encounter across the health system. Diagnostic accuracy is an essential data quality element in EHR-based research. We report here the first analyses from the SCD DH, focusing on the demographics of the cohort and accurate identification of SCD diagnosis type.
Methods: SCD DH sites have a data use agreement with the ASH RC, and either a reliance agreement with the Western-Copernicus Group Institutional Review Board (IRB) or local IRB approval to extract and transfer EHR data at least quarterly using the Observational Medical Outcomes Partnership (OMOP) common data model. The SCD DH was supplemented with a site principal investigator (PI) attestation of SCD diagnosis based on existing data sources (e.g., local registries, clinic notes, problem lists) deemed sufficient by the PI. Such cases are designated Investigator-Verified. A random sample of 25 patients (per site) with investigator verified SCD types and at least one encounter in 2022, was evaluated further at 10 sites in a Data Validation Pilot Project. SCD type and other variables were manually abstracted by site personnel and reported to the DH through REDCap. For each data item abstracted, source locations were recorded. The investigator-verified and manually abstracted SCD diagnosis types were compared. Concordance was calculated as the percentage of cases with the same SCD type determined by the Investigator-Verified and abstracted methods.
Results: Of the 56 sites enrolled in the DH, 20 are currently submitting data. As of 01/31/2024, the DH included data for 24,581 unique individuals with SCD. The geographic distribution of sites aligns well with the national distribution of the SCD population. Sites were urban (95%) and suburban (5%); all were academic medical centers; and were self-designated as pediatric (45%), adult (40%), or combined (15%) programs. The median patient age was 23 years (range: birth to 75 years); 55.4% were female.
The SCD DH contained 9,484 patients across 12 sites with Investigator-Verified SCD diagnosis types. SCD types were reported as HbSS (67.2%), HbSC (22.2%), HbSß0thalassemia (2.1%), HbSß+thalassemia (6.1%), or HbS/Other (2.4%). In the Data Validation Pilot Project, SCD type was manually abstracted in a random sample of 243 patients at 10 sites. Thirty-six patients, for whom the Investigator-Verified diagnosis was “unable to determine”, were excluded from the concordance analysis.
Of the 207 evaluable cases, 193 had concordant SCD diagnosis types, with an overall concordance rate of 93%. Five of nine sites had 100% concordance. There were 14 cases (7%) where the manually-abstracted SCD type was discordantwith the Investigator-Verified SCD diagnosis. Discordant types were most often reported as HbSS in the Investigator-Verified Diagnosis Project, and discordance was higher for patients with less common SCD types, such as HbSß+thalassemia. Source locations reported for manual abstraction of SCD diagnosis included hemoglobin electrophoresis, HPLC, DNA testing, patient registry, newborn screening, and clinical documentation. Determination of concordance between other automated extracted and manually abstracted data elements is now underway for this validation study subset, the larger investigator-verified cohort, and the full DH SCD population to generate EHR-derived computable phenotype algorithms.
Conclusions: The ASH RC DH is one of the largest SCD data resources in the United States. Investigator-Verified diagnosis had high concordance with a manual chart abstraction in a random sample of patients across 10 sites. This concordance increases confidence in the validity of both existing and future SCD DH data. Well-curated and transformed data on a large cohort of individuals with SCD will be leveraged to conduct observational studies and inform the design and feasibility of prospective interventional studies to generate high-quality, granular real-world evidence and improve outcomes for this complex patient population.
Thompson:Novartis: Research Funding; CRISPR/Vertex: Consultancy, Research Funding; Global Blood Therapeutics: Divested equity in a private or publicly-traded company in the past 24 months; Editas: Consultancy, Research Funding; bluebird bio: Consultancy, Research Funding; Beam Therapeutics: Consultancy, Research Funding. Neuberg:Madrigal Pharmaceutical: Current equity holder in publicly-traded company. Brandow:Pfizer: Other: Adjudication committee for clinical trial . King:UpToDate: Patents & Royalties; Cigna: Consultancy. Lanzkron:Glycomimetics: Consultancy; CSL-Behring: Research Funding; HRSA: Research Funding; bluebird bio: Membership on an entity's Board of Directors or advisory committees; Pfizer: Current holder of stock options in a privately-held company; Merck: Consultancy; Novo Nordisk: Membership on an entity's Board of Directors or advisory committees; Novartis: Consultancy, Research Funding; Takeda: Research Funding; Pfizer: Consultancy; Agios: Membership on an entity's Board of Directors or advisory committees; Teva: Current holder of stock options in a privately-held company; PCORI: Research Funding. Wood:Pfizer: Research Funding; Genetech: Research Funding; ASH Research Collaborative: Consultancy; Koneksa Health: Consultancy, Current equity holder in publicly-traded company; Teledoc Health: Consultancy.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal